5 research outputs found
Cooperation and Social Dilemmas with Reinforcement Learning
Cooperation between humans has been foundational for the development of civilisation and yet there are many questions about how it emerges from social interactions.
As artificial agents begin to play a more significant role in our lives and are introduced into our societies, it is apparent that understanding the mechanisms of cooperation is important also for the design of next-generation multi-agent AI systems. Indeed, this is particularly important in the case of supporting cooperation between self-interested AI agents.
In this thesis, we focus on the analysis of the application of mechanisms that are at the basis of human cooperation to the training of reinforcement learning agents. Human behaviour is a product of cultural norms, emotions and intuition amongst other things: we argue it is possible to use similar mechanisms to deal with the complexities of multi-agent cooperation. We outline the problem of cooperation in mixed-motive games, also known as social dilemmas, and we focus on the mechanisms of reputation dynamics and partner selection, two mechanisms that have been strongly linked to indirect reciprocity in Evolutionary Game Theory. A key point that we want to emphasise is the fact we assume no prior knowledge and explicit definition of strategies, which instead are fully learnt by the agents during the games.
In our experimental evaluation, we demonstrate the benefits of applying these mechanisms to the training process of the agents, and we compare our findings with results presented in a variety of other disciplines, including Economics and Evolutionary Biology
Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning
Social dilemmas have been widely studied to explain how humans are able to
cooperate in society. Considerable effort has been invested in designing
artificial agents for social dilemmas that incorporate explicit agent
motivations that are chosen to favor coordinated or cooperative responses. The
prevalence of this general approach points towards the importance of achieving
an understanding of both an agent's internal design and external environment
dynamics that facilitate cooperative behavior. In this paper, we investigate
how partner selection can promote cooperative behavior between agents who are
trained to maximize a purely selfish objective function. Our experiments reveal
that agents trained with this dynamic learn a strategy that retaliates against
defectors while promoting cooperation with other agents resulting in a
prosocial society.Comment:
Cooperation and Reputation Dynamics with Reinforcement Learning
Creating incentives for cooperation is a challenge in natural and artificial
systems. One potential answer is reputation, whereby agents trade the immediate
cost of cooperation for the future benefits of having a good reputation. Game
theoretical models have shown that specific social norms can make cooperation
stable, but how agents can independently learn to establish effective
reputation mechanisms on their own is less understood. We use a simple model of
reinforcement learning to show that reputation mechanisms generate two
coordination problems: agents need to learn how to coordinate on the meaning of
existing reputations and collectively agree on a social norm to assign
reputations to others based on their behavior. These coordination problems
exhibit multiple equilibria, some of which effectively establish cooperation.
When we train agents with a standard Q-learning algorithm in an environment
with the presence of reputation mechanisms, convergence to undesirable
equilibria is widespread. We propose two mechanisms to alleviate this: (i)
seeding a proportion of the system with fixed agents that steer others towards
good equilibria; and (ii), intrinsic rewards based on the idea of
introspection, i.e., augmenting agents' rewards by an amount proportionate to
the performance of their own strategy against themselves. A combination of
these simple mechanisms is successful in stabilizing cooperation, even in a
fully decentralized version of the problem where agents learn to use and assign
reputations simultaneously. We show how our results relate to the literature in
Evolutionary Game Theory, and discuss implications for artificial, human and
hybrid systems, where reputations can be used as a way to establish trust and
cooperation.Comment: Published in AAMAS'21, 9 page